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Abstract: 

An increasing cache latency in next-generation processors incurs profound performance 
impacts in spite of advanced out-of-order execution techniques. One way to circumvent 
this cache latency problem is to predict load values at the onset of pipeline execution 
by exploiting either the load value locality or the address correlation of stores and 
loads. In this paper, we describe a new load value speculation mechanism based on the 
program syntax correlation of stores and loads. We establish a symbolic cache (SC) , 
which is accessed in early pipeline stages to achieve a zero-cycle load. Instead of using 
memory addresses, the SC is accessed by the encoding bits of base register ID plus the 
displacement directly from the instruction code. Performance evaluations using SPEC95 
and SPEC2000 integer programs on SimpleScalar simulation tools show that the SC 
achieves higher prediction accuracy in comparison with other load value speculation 
methods, especially when hardware resources are limited. 

Index Terms: 

cache storage integer programming microprocessor chips pipeline processing storage 
allocation SPEC2000 SPEC95 SimpleScalar simulation address-free memory access base 
register ID cache latency displacement value instruction code integer programming load value 
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ABSTRACT : 
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A data processing apparatus, a computer, an article including a machine-accessible medium, and a 
method of processing data are disclosed. The data processing apparatus may include a pair of 
pipelines sharing an instruction cache, data cache, and a branch predictor with the second 
pipeline running ahead of the first pipeline using a data value prediction module. The pipelines 
may be included in one or more processors and coupled to a memory to form a computer. The method 
includes executing a plurality of instructions using the pipeline pair, such that when a cache 
miss is encountered by the second pipeline during execution of a LOAD instruction, the data value 
prediction module supplies a predicted load value in lieu of a cached value, enabling continued 
execution of the plurality of instructions by the second pipeline without waiting for the return 
of the cached value. 
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on Programming language design and implementation, volume 37 issue 5 
Full text available: ||| pdf(273.26 KB) Additional Information: full citation , abstract , references , index terms 

While caches are effective at avoiding most main-memory accesses, the few remaining 
memory references are still expensive. Even one cache miss per one hundred accesses can 
double a program's execution time. To better tolerate the data-cache miss latency, architects 
have proposed various speculation mechanisms, including load-value prediction. A load-value 
predictor guesses the result of a load so that the dependent instructions can immediately 
proceed without having to wait for the memory access ... 
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Value locality, a recently discovered program attribute that describes the likelihood of the 
recurrence of previously-seen program values, has been studied enthusiastically in the recent 
published literature. Much of the energy has focused on refining the initial efforts at 
predicting load instruction outcomes, with the balance of the effort examining the value 
locality of either all register-writing instructions, or a focused subset of them. Surprisingly, 
there has been very little publish ... 
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This paper studies an interesting yet less explored behavior of memory access instructions, 
called access region locality. Unlike the traditional temporal and spatial data locality that 
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focuses on individual memory locations and how accesses to the locations are inter-related, 
the access region locality concerns with each static memory instruction and its range of 
access locations at run time. We consider program's data, heap, and stack regions in this 
paper. Our experimental study ... 
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This paper presents a technique called register value prediction (RVP) which uses a type of 
locality called register-value reuse. By predicting that an instruction will produce the value 
that is already stored in the destination register, we eliminate the need for large value buffers 
to enable value prediction. Even without the large buffers, register-value prediction can be 
made as or more effective than last-value prediction, particularly with the aid of compiler 
management of values in the re ... 
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The predictability of data values is studied at a fundamental level. Two basic predictor models 
are defined: Computational predictors perform an operation on previous values to yield 
predicted next values. Examples we study are stride value prediction (which adds a delta to a 
previous value) and last value prediction (which performs the trivial identity operation on the 
previous value); Context Based} predictors match recent value history (context) with 
previous value history and predict values ... 

Keywords: Prediction, Value Prediction, Context Based Prediction, Stride Prediction, Last 
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Since the introduction of virtual memory demand-paging and cache memories, computer 
systems have been exploiting spatial and temporal locality to reduce the average latency of a 
memory reference. In this paper, we introduce the notion of value locality, a third facet of 
locality that is frequently present in real-world programs, and describe how to effectively 
capture and exploit it in order to perform load value prediction. Temporal and spatial locality 
are attributes of storage I ... 
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In contemporary out-of-order superscalar design, high IPC is mainly achieved by exposing 
high instruction level parallelism (ILP). Scaling issue window size can certainly provide more 
ILP; however, future processor scaling demands threaten to limit the size of the issue 
window. In this study, we propose a dynamic instruction sorting mechanism that provides 
more ILP without increasing the size of the issue window. In our approach, early in the 
pipeline, we predict how long an instruction needs to ... 

Keywords: CLP, LHT, MNM, SILO, instruction sorting 
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